Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dynamic extraction of a parameter attribute #3143

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

ScriptSathi
Copy link
Contributor

@ScriptSathi ScriptSathi commented Nov 19, 2024

The discussion for this PR can be found here #3142

This is currently a draft.
I wanted to start the discussion before submitting the final code, as I think, it is a big enough PR.

Take this PR in the today state as a proof of concept for dynamic parameter extraction. I will continue to work on this PR until the below checks are done.

At the current state, the PR is able to

  • Extract parameters from LSM programs
  • Extract parameters from kprobes programs
  • Handle exception for uprobes as it is not part of BTF
  • Have unit tests for all cases
  • Have written documentation

Description

This PR introduce the dynamic extraction of a custom attribute

Comments

  • I do not like the use of OverwriteType parameter. But since the function argSelectorType does not receive EventConfig, it is not possible to search for the type using directly BTF types. So I suggest doing another PR before this one is merged to do so if possible. Then I'll remove the parameter. It is also not possible to add an if condition for uprobes in this function. So if the user defines the parameter, it will overwrite the type at this moment.
  • I choose to use u8 to store the offset, as the verifier does not allow me to use u16. At this moment, I don't know if it is possible to found offset > 255 in BTF structures. For my uses cases, using u8 was enough.
  • I would appreciate seeing a closer relationship between BTF and the existing types. This mean that we could directly call it from YAML instead of hard-coding it. At least doing so for primitives types, and hard-code only the complex structures

Test the PR

You can use the following config

apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: "lsm"
spec:
  lsmhooks:
  - hook: "bprm_check_security"
    args:
      - index: 0
        type: "linux_binprm"
        extractParam: "file.f_path.dentry.d_name.name"
        overwriteType: "string"
    selectors:
      - matchArgs:
        - index: 0
          operator: "Postfix"
          values:
            - "ls"
            - "sh"
            - "bash"

If you want to test it with more arguments, you can use bprm_creds_from_file hook. It has struct linux_binprm and struct file which are supported.

Release-note

Add dynamic extraction of a parameter attribute

@ScriptSathi ScriptSathi requested a review from a team as a code owner November 19, 2024 16:27
@ScriptSathi ScriptSathi requested a review from jrfastab November 19, 2024 16:27
@ScriptSathi ScriptSathi marked this pull request as draft November 19, 2024 16:28
@ScriptSathi ScriptSathi force-pushed the pr/ScriptSathi/add-dynamic-paramter-extraction branch from 9a9d21f to ce33ec7 Compare November 20, 2024 10:48
Copy link

netlify bot commented Nov 20, 2024

Deploy Preview for tetragon ready!

Name Link
🔨 Latest commit a04e634
🔍 Latest deploy log https://app.netlify.com/sites/tetragon/deploys/67814c806294f10008ea68ac
😎 Deploy Preview https://deploy-preview-3143--tetragon.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@ScriptSathi ScriptSathi force-pushed the pr/ScriptSathi/add-dynamic-paramter-extraction branch from ce33ec7 to 5ebb3e0 Compare November 20, 2024 16:00
@mtardy mtardy linked an issue Nov 21, 2024 that may be closed by this pull request
2 tasks
@mtardy mtardy added the release-note/major This PR introduces major new functionality label Nov 21, 2024
@ScriptSathi ScriptSathi force-pushed the pr/ScriptSathi/add-dynamic-paramter-extraction branch 2 times, most recently from e3117d7 to ba048b3 Compare November 21, 2024 10:59
Copy link
Contributor

@olsajiri olsajiri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it, left some comments

thanks for splitting the code in multiple logical changes, it's rare ;-)

we were originally thinking of using C expression parsing (there's some go lib for that) but I think that can be done later if it's ever needed, the basic parsing you did should be fine

please add tests, would be great to have some framework where we could easily add various 'ExtractParam' expressions for testing

pkg/sensors/tracing/genericuprobe.go Show resolved Hide resolved
bpf/process/types/basic.h Outdated Show resolved Hide resolved
pkg/btf/btf.go Show resolved Hide resolved
pkg/btf/btf.go Outdated Show resolved Hide resolved
bpf/process/generic_calls.h Outdated Show resolved Hide resolved
bpf/process/types/basic.h Outdated Show resolved Hide resolved
bpf/process/generic_calls.h Outdated Show resolved Hide resolved
bpf/process/generic_calls.h Outdated Show resolved Hide resolved
pkg/sensors/tracing/generickprobe.go Show resolved Hide resolved
@ScriptSathi ScriptSathi force-pushed the pr/ScriptSathi/add-dynamic-paramter-extraction branch from ba048b3 to 1fb7dbc Compare November 23, 2024 17:54
@pchaigno pchaigno removed the request for review from jrfastab December 9, 2024 18:05
@ScriptSathi ScriptSathi force-pushed the pr/ScriptSathi/add-dynamic-paramter-extraction branch 2 times, most recently from 753fd10 to 2b1153f Compare January 10, 2025 14:12
@ScriptSathi ScriptSathi changed the title Add dynamic parameter extraction Add dynamic extraction of a parameter attribute Jan 10, 2025
This commit introduces the `struct config_btf_arg_depth`. It appends
`btf_argN` to `struct event_config` as an array. This array stores the
path to the searched data.

Any `btf_argN` can have a list of elements as follow :
file->f_path is 152 bytes, so the array will look like
[{ offset: 152, is_pointer: 0, is_initialized: 1 }, ...].

The max value `MAX_BTF_ARG_DEPTH` as been set arbitrary as the verifier
need a fixed size.

In config_btf_arg, is_pointer and is_initialized are u16 because it must
match padding of 64 bits long structure

Signed-off-by: Tristan d'Audibert <[email protected]>
@ScriptSathi ScriptSathi force-pushed the pr/ScriptSathi/add-dynamic-paramter-extraction branch from 2b1153f to a04e634 Compare January 10, 2025 16:36
@ScriptSathi ScriptSathi marked this pull request as ready for review January 10, 2025 16:38
@ScriptSathi ScriptSathi requested a review from mtardy as a code owner January 10, 2025 16:38
Tristan d'Audibert added 2 commits January 10, 2025 17:54
…_argN

This commit introduces the “extract_arg_depth” function and loops over it
to move into the “arg” buffer of config->btf_argN[i]->offset by iteration.
The goal is to reach the requiered data by overwriting over arg with
the new value.

Signed-off-by: Tristan d'Audibert <[email protected]>
This commit checks if btf.Type exists in Tetragon existing types.

For instance:

`struct file` with btf is called `file` and also exists in
`GenericStringToType` with the same alias.
However, the attribute `name` in `struct qstr` has a returned type
`unsigned char` wich does not exist yet in `GenericStringToType`.

The same thing happened with `linux_binprm->filename` as the type is
`char`

Signed-off-by: Tristan d'Audibert <[email protected]>
@ScriptSathi ScriptSathi force-pushed the pr/ScriptSathi/add-dynamic-paramter-extraction branch from a04e634 to 058c828 Compare January 10, 2025 16:55
Tristan d'Audibert added 7 commits January 10, 2025 19:11
FindNextBtfType function recursively searches in a btf structure in order
to find a specific path until it reaches the target or fails.

The function also searches in embedded anonymous structures or unions to
cover as much use cases as possible. For instance, mm_struct has 2
fields; anonymous struct and another type. But you are still able to look
into the anonymous struct by specifying a path like "mm.pgd.pgd".

For instance, if the search is in the linux_binprm structure and the
path is `file.f_path.dentry.d_name.name`, the following
actions will be done.

- Look for the variable name `file` inside `linux_binprm`.

- If it matches, it stores the offset from linux_binprm where the `file`
  variable could be found.

- Then it takes the btf type `file` and searches for a parameter named
  `f_path`.

Signed-off-by: Tristan d'Audibert <[email protected]>
In order to read the data properly on BPF side, integer/long values must
use `bpf_probe_read`. So now, every time the latest type retrieved is
defined as an integer, it will be safely read before accessing the data.

Signed-off-by: Tristan d'Audibert <[email protected]>
This commit adds 2 parameters to give the ability to the user to search
for a specific variable following a "path" as follow:

```yml
...
    args:
      - index: 0
        type: "linux_binprm"
        extractParam: "file.f_path.dentry.d_name.name"
        overwriteType: "string"
...
```

The above config can be used to extract a specific parameter from the
structure at index 0.

Signed-off-by: Tristan d'Audibert <[email protected]>
The OverwriteType parameter should be deleted if `argSelectorType` can use
the `EventConfig` structure to search for the correct Type.

Signed-off-by: Tristan d'Audibert <[email protected]>
Searches if every user defined type with ExtractParam exists as a BTF
type and stores its corresponding offset in ConfigBtfArg.

This function does a basic split on `ExtractParam` to obtain the "path"
to the required data. Then, the array is given to `btf.FindNextBTFType` to
find the offset of each element until we reach the required data.
The output is stored in EventConfig to keep the normal behaviour.

For example, if the arg 0 is `struct linux_binprm` and ExtractParam is
set to `file.f_path.dentry.d_name.name`, the output will give an array
of all the offsets from their parents as follows :
[{ offset: 96, is_pointer: 0 }, { offset: 152, is_pointer: 1 }, ...]

Signed-off-by: Tristan d'Audibert <[email protected]>
This commit updates `addLsm` function to use the `ExtractParam`
and `OverwriteType` in order to look for the attributes in BTF
structure.

Signed-off-by: Tristan d'Audibert <[email protected]>
…probes

As BTF types are not defined for Uprobes, their offsets can't be found in
BTF file. Thus, with this commit, if the user defines ExtractParam /
OverwriteType, their are ignored and a warning is displayed

Signed-off-by: Tristan d'Audibert <[email protected]>
Tristan d'Audibert added 5 commits January 10, 2025 19:11
Add very similar code as in `genericlsm.go` file to handle ExtractParam
feature.

Signed-off-by: Tristan d'Audibert <[email protected]>
This commit adds 3 tests for FindNextBtfType algorithm.
The first is `testAssertEqualBtfPath` to assert that a specific
path has the exact same btfConfig as expected.
The second, "testAssertPathIsAccessible" tries to reach the path and
asserts that no errors are raised.
The third test "testAssertErrorOnInvalidPath" asserts that the error
messages raised if the path is incorrect.

The chosen test cases have embed union/anonymous structs.

The important thing to notice in case of adding new tests is to be
aware of btf changes on different architectures. Especially for
`testAssertEqualBtfPath` where Offset could be different.

To Test locally :
```
go test -exec "sudo" ./pkg/btf/
```

Signed-off-by: Tristan d'Audibert <[email protected]>
This test aims at testing the ExtractParam feature.

Tetragon use hard-coded types instead of BTF. So currently, the
ExtractParam feature does not allow to extract attributes from other
types than the few hard-coded in `generictypes.go`.

For instance, if you try to use the feature with task_struct structure,
it will fail because it is not yet hard-coded.

Signed-off-by: Tristan d'Audibert <[email protected]>
@ScriptSathi ScriptSathi force-pushed the pr/ScriptSathi/add-dynamic-paramter-extraction branch from 058c828 to 391c5a8 Compare January 10, 2025 18:32
@kkourt kkourt self-requested a review January 14, 2025 14:55
Copy link
Contributor

@olsajiri olsajiri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great, I left some comments, thanks

@@ -172,6 +185,11 @@ struct event_config {
*/
__u32 policy_id;
__u32 flags;
struct config_btf_arg btf_arg0[MAX_BTF_ARG_DEPTH];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not follow the user space and have just btf_arg[5][10]
I don't see btf_arg[1-4] being used anywhere

"fmt"
"strings"

_btf "github.com/cilium/ebpf/btf"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit _ looks strange to me, would preffer ebtf which we use on some other place

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah especially _ has a specific use in Go imports.

return processMembers(btfArgs, currentType, t.Members, pathToFound, i)
case *btf.Pointer:
(*btfArgs)[i-1].IsPointer = uint16(1)
return FindNextBtfType(btfArgs, t.Target, pathToFound, i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is called (at beggining) with i == 0, but you are setting [i -1] field, could this crash?

Return: The last type found matching the path, or error.
*/
func FindNextBtfType(
Copy link
Contributor

@olsajiri olsajiri Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this is mostly called like:
lastBtfType, err := btf.FindNextBtfType(btfArg, rootType, partialPath, 0)
which seems confusing because it's called next but return last type,
also always called with zero index

how about we add another top level function like ResolveParam that would
prepare the path array and called local function that would recurse?

@@ -341,6 +341,10 @@ func addUprobe(spec *v1alpha1.UProbeSpec, ids []idtable.EntryID, in *addUprobeIn
return nil, fmt.Errorf("Error add arg: ArgType %s Index %d out of bounds",
a.Type, int(a.Index))
}

if a.ExtractParam != "" || a.OverwriteType != "" {
logger.GetLogger().Warnf("Extracting parameters from Uprobes is not supported, ignoring")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's fail the policy, it's wrong IMO

assert.ErrorContains(t, err, "Attribute 'fail' not found in structure ''")
}

func TestFindNextBtf(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test crashes in CI, please check

{Offset: 64, IsPointer: 1, IsInitialized: 1},
{Offset: 168, IsPointer: 1, IsInitialized: 1},
{Offset: 0, IsPointer: 1, IsInitialized: 1},
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this will fail the same way like TestBuildBtfArgFromLSMPolicy on offset values?
could you just read expected offsets from BTF?

- index: 0
type: "linux_binprm"
extractParam: "mm.owner.real_parent.real_parent.comm"
overwriteType: "string"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need overwriteType? could we maybe just do something like:

 - index: 0
   type: "string"
   extractParam: "linux_binprm.file.f_path.dentry.d_name.name"

the overwrite stuff might cause headaches later

also maybe rename extractParam to resolve? not sure ;-)

cc @kkourt

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

struct config_btf_arg btf_arg1[MAX_BTF_ARG_DEPTH];
struct config_btf_arg btf_arg2[MAX_BTF_ARG_DEPTH];
struct config_btf_arg btf_arg3[MAX_BTF_ARG_DEPTH];
struct config_btf_arg btf_arg4[MAX_BTF_ARG_DEPTH];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not follow the user space part and have just btf_arg[5][MAX_BTF_ARG_DEPTH]
I did not see btf_arg[1-4] being touched elsewhere..

Copy link
Member

@mtardy mtardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been pulled-in for the docs change, here's a review. Thanks for adding docs!

"fmt"
"strings"

_btf "github.com/cilium/ebpf/btf"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah especially _ has a specific use in Go imports.

Comment on lines +405 to +407
For specific use cases, you may want to extract a specific attribute from the argument.
For instance you have `struct linux_binprm` as first argument and want to filter parent
process name, you can do it as following.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For specific use cases, you may want to extract a specific attribute from the argument.
For instance you have `struct linux_binprm` as first argument and want to filter parent
process name, you can do it as following.
You may want to extract a specific attribute from the argument structure. For instance,
you are hooking a function with `struct linux_binprm` as the first argument and want to
filter the parent process' name, you can do it as follows.

@@ -400,6 +400,61 @@ The `maxData` flag does not work with `returnCopy` flag at the moment, so it's
usable only for syscalls/functions that do not require return probe to read the
data.

### Advanced usage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would maybe be more specific on this one, like "extracting structure fields/attribute", whatever you prefer that describes best this feature.

Comment on lines +427 to +430
The above policy will display the parent process name every time the hook is called.
The `extractParam` field is used to reach a specific data into the `struct
linux_binprm`. It is important to set `overwriteType` as well to make sure the
reached data is read correctly (as a string in this case).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The above policy will display the parent process name every time the hook is called.
The `extractParam` field is used to reach a specific data into the `struct
linux_binprm`. It is important to set `overwriteType` as well to make sure the
reached data is read correctly (as a string in this case).
The above policy will display the parent process's comm every time the hook is called.
The `extractParam` field is used to reach specific data into the `struct linux_binprm`.
It is important to set `overwriteType` as well to make sure the reached data is read
correctly (as a string in this case).

I would say comm maybe instead of name since this value can be changed by the thread itself, it's not a really safe way of identifying a process name.

- index: 0
type: "linux_binprm"
extractParam: "mm.owner.real_parent.real_parent.comm"
overwriteType: "string"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +433 to +436
- This feature requires you to know exactly what you are looking for in the attributes
of the hook parameters. For instance, if you want to have a look on what is
available inside `struct linux_binprm`, take a look at the
[Bootlin website](https://elixir.bootlin.com/linux/v6.12.5/source/include/linux/binfmts.h#L18)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- This feature requires you to know exactly what you are looking for in the attributes
of the hook parameters. For instance, if you want to have a look on what is
available inside `struct linux_binprm`, take a look at the
[Bootlin website](https://elixir.bootlin.com/linux/v6.12.5/source/include/linux/binfmts.h#L18)
- This feature requires you to read the kernel structure definitions to find what you're
looking for in the hook parameter attributes. For instance, if you want to have a look
at what is available inside `struct linux_binprm`, take a look at its definition in
[include/linux/binfmts.h](https://elixir.bootlin.com/linux/v6.12.5/source/include/linux/binfmts.h#L18)

Comment on lines +438 to +439
- Some structures are dynamic. This means that they may change at runtime. So you need to
be aware of what you are looking for.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Some structures are dynamic. This means that they may change at runtime. So you need to
be aware of what you are looking for.
- Some structures are dynamic which means that they may change at runtime.

Comment on lines +442 to +445
Tetragon can also handle some structures such as `struct file` or `struct
path` and few others. This means you can also extract the whole struct, if it is
available in the attributes of the parameter, and set the type with the correct type
like this :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Tetragon can also handle some structures such as `struct file` or `struct
path` and few others. This means you can also extract the whole struct, if it is
available in the attributes of the parameter, and set the type with the correct type
like this :
Tetragon can also handle some structures such as `struct file` or `struct
path` and a few others. This means you can also extract the whole struct if it is
available in the attributes of the parameter, and set the type with the correct type
like this :

Comment on lines +447 to +456
```yaml
- index: 0
type: "linux_binprm"
extractParam: "file"
overwriteType: "file"
# Or
# extractParam: "file.f_path"
# overwriteType: "path"
```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```yaml
- index: 0
type: "linux_binprm"
extractParam: "file"
overwriteType: "file"
# Or
# extractParam: "file.f_path"
# overwriteType: "path"
```
```yaml
- index: 0
type: "linux_binprm"
extractParam: "file"
overwriteType: "file"
```
Or
```yaml
- index: 0
extractParam: "file.f_path"
overwriteType: "path"
```

Mmh not sure my suggestion is correct but I would split it in two.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/major This PR introduces major new functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add dynamic parameter extration for specific use cases
3 participants